Goto

Collaborating Authors

 international mathematical olympiad


RIMO: An Easy-to-Evaluate, Hard-to-Solve Olympiad Benchmark for Advanced Mathematical Reasoning

Chen, Ziye, Qin, Chengwei, Shu, Yao

arXiv.org Artificial Intelligence

As large language models (LLMs) reach high scores on established mathematical benchmarks, such as GSM8K and MATH, the research community has turned to International Mathematical Olympiad (IMO) problems to push the evaluation frontier. However, existing Olympiad-level benchmarks suffer from practical constraints that introduce grading noise and potential bias, such as heterogeneous answer formats requiring model-based judges and a reliance on potentially flawed solutions. We introduce RIMO, a two-track benchmark designed to preserve peak Olympiad difficulty while eliminating this evaluation noise. The first track, RIMO-N, rewrites 335 IMO problems to admit a single, unique integer answer, allowing for deterministic correctness checking. The second track, RIMO-P, features 456 proof problems with expert-checked solutions, which are decomposed into a sequence of sub-problems to evaluate the step-by-step reasoning process via an automated grading system. Our benchmarking of ten frontier LLMs, including GPT-4o and Gemini 2.5 Flash, reveals that while these systems excel on older benchmarks, their performance drops sharply on RIMO. These results highlight a substantial gap between current LLM capabilities and actual Olympiad-level reasoning. By providing a challenging yet easy-to-evaluate suite, RIMO offers a high-resolution yardstick for future research, presenting a clear target for closing the profound reasoning gap our findings expose.


The Mathematician's Assistant: Integrating AI into Research Practice

Henkel, Jonas

arXiv.org Artificial Intelligence

The rapid development of artificial intelligence (AI), marked by breakthroughs like 'AlphaEvolve' and 'Gemini Deep Think', is beginning to offer powerful new tools that have the potential to significantly alter the research practice in many areas of mathematics. This paper explores the current landscape of publicly accessible large language models (LLMs) in a mathematical research context, based on developments up to August 2, 2025. Our analysis of recent benchmarks, such as MathArena and the Open Proof Corpus (Balunović et al., 2025; Dekoninck et al., 2025), reveals a complex duality: while state-of-the-art models demonstrate strong abilities in solving problems and evaluating proofs, they also exhibit systematic flaws, including a lack of self-critique and a model depending discrepancy between final-answer accuracy and full-proof validity. Based on these findings, we propose a durable framework for integrating AI into the research workflow, centered on the principle of the augmented mathematician. In this model, the AI functions as a copilot under the critical guidance of the human researcher, an approach distilled into five guiding principles for effective and responsible use. We then systematically explore seven fundamental ways AI can be applied across the research lifecycle, from creativity and ideation to the final writing process, demonstrating how these principles translate into concrete practice. We conclude that the primary role of AI is currently augmentation rather than automation. This requires a new skill set focused on strategic prompting, critical verification, and methodological rigor in order to effectively use these powerful tools.


DeepMind and OpenAI claim gold in International Mathematical Olympiad

New Scientist

Experimental AI models from Google DeepMind and OpenAI have achieved a gold-level performance in the International Mathematical Olympiad (IMO) for the first time. The companies are hailing the moment as an important milestone for AIs that might one day solve hard scientific or mathematical problems, but mathematicians are more cautious because details of the models' results and how they work haven't been made public. The IMO, one of the world's most prestigious competitions for young mathematicians, has long been seen by AI researchers as a litmus test for mathematical reasoning that AI systems tend to struggle with. After last year's competition held in Bath, UK, Google DeepMindannounced that AI systems it had developed, called AlphaProof and AlphaGeometry, had together achieved a silver medal-level performance, but its entries weren't graded by the competition's official markers. Before this year's contest, which was held in Queensland, Australia, companies including Google, Huawei and TikTok-owner ByteDance, as well as academic researchers, approached the organisers to ask whether they could have their AI models' performance officially graded, says Gregor Dolinar, the IMO's president.


TechScape: Will OpenAI's 5bn gamble on chatbots pay off? Only if you use them

The Guardian

What if you build it and they don't come? The Guardian's journalism is independent. We will earn a commission if you buy something through an affiliate link. It's fair to say the shine is coming off the AI boom. Soaring valuations are starting to look unstable next to the sky-high spending required to sustain them.


DeepMind AI gets silver medal at International Mathematical Olympiad

New Scientist

DeepMind's AlphaProof AI can tackle a range of mathematical problems An AI from Google DeepMind has achieved a silver medal score at this year's International Mathematical Olympiad (IMO), the first time any AI has made it to the podium. The IMO is considered the world's most prestigious competition for young mathematicians. Correctly answering its test questions requires mathematical ability that AI systems typically lack. In January, Google DeepMind demonstrated AlphaGeometry, an AI system that could answer some IMO geometry questions as well as humans. However, this was not from a live competition, and it couldn't answer questions from other mathematical disciplines, such as number theory, algebra and combinatorics, which is necessary to win an IMO medal.


Wu's Method can Boost Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry

Sinha, Shiven, Prabhu, Ameya, Kumaraguru, Ponnurangam, Bhat, Siddharth, Bethge, Matthias

arXiv.org Artificial Intelligence

Proving geometric theorems constitutes a hallmark of visual reasoning combining both intuitive and logical skills. Therefore, automated theorem proving of Olympiad-level geometry problems is considered a notable milestone in human-level automated reasoning. The introduction of AlphaGeometry, a neuro-symbolic model trained with 100 million synthetic samples, marked a major breakthrough. It solved 25 of 30 International Mathematical Olympiad (IMO) problems whereas the reported baseline based on Wu's method solved only ten. In this note, we revisit the IMO-AG-30 Challenge introduced with AlphaGeometry, and find that Wu's method is surprisingly strong. Wu's method alone can solve 15 problems, and some of them are not solved by any of the other methods. This leads to two key findings: (i) Combining Wu's method with the classic synthetic methods of deductive databases and angle, ratio, and distance chasing solves 21 out of 30 methods by just using a CPU-only laptop with a time limit of 5 minutes per problem. Essentially, this classic method solves just 4 problems less than AlphaGeometry and establishes the first fully symbolic baseline strong enough to rival the performance of an IMO silver medalist. (ii) Wu's method even solves 2 of the 5 problems that AlphaGeometry failed to solve. Thus, by combining AlphaGeometry with Wu's method we set a new state-of-the-art for automated theorem proving on IMO-AG-30, solving 27 out of 30 problems, the first AI method which outperforms an IMO gold medalist.


Can Artificial Intelligence Win Olympics ?

#artificialintelligence

Data scientists are trying to build an AI system that can win a gold medal at the world's premier math competition Indeed, researchers view the IMO as the ideal proving ground for machines designed to think like humans. If an AI system can excel here, it will have matched an important dimension of human cognition. "The IMO, to me, represents the hardest class of problems that smart people can be taught to solve somewhat reliably," said Daniel Selsam of Microsoft Research. Selsam is a founder of the IMO Grand Challenge, whose goal is to train an AI system to win a gold medal at the world's premier math competition. The International Mathematical Olympiad (IMO) is a mathematical olympiad for pre-college students, and is the oldest of the International Science Olympiads.